Reconsidering Complex Branch Predictors
نویسنده
چکیده
To sustain instruction throughput rates in more aggressively clocked microarchitectures, microarchitects have incorporated larger and more complex branch predictors into their designs, taking advantage of the increasing numbers of transistors available on a chip. Unfortunately, because of penalties associated with their implementations, the extra accuracy provided by many branch predictors does not produce a proportionate increase in performance. Specifically, we show that the techniques used to hide the latency of a large and complex branch predictor do not scale well and will be unable to sustain IPC for deeper pipelines. We investigate a different way to build large branch predictors. We propose an alternative predictor design that completely hides predictor latency so that accuracy and hardware budget are the only factors that affect the efficiency of the predictor. Our simple design allows the predictor to be pipelined efficiently by avoiding difficulties introduced by complex predictors. Because this predictor eliminates the penalties associated with complex predictors, overall performance exceeds that of even the most accurate known branch predictors in the literature at large hardware budgets. We conclude that as chip densities increase in the next several years, the accuracy of complex branch predictors must be weighed against the performance benefits of simple branch predictors.
منابع مشابه
Area-Aware Pipeline Gating for Embedded Processors
Modern embedded processors use small and simple branch predictors to improve performance. Using complex and accurate branch predictors, while desirable, is not possible as such predictors impose high power and area overhead which is not affordable in an embedded processor. As a result, for some applications, misprediction rate can be high. Such mispredictions result in energy wasted down the mi...
متن کاملLAFI: Look-Ahead Mechanism for Energy-Efficient Branch Prediction
To feed the execution core with uninterrupted instruction streams, modern processors access complex branch predictors every clock cycle, incurring high energy consumption. In this paper, we conduct a systematic study on the energy consumption of representative branch predictors, and identify several sources of energy inefficiency. Based on this study, we introduce an improved branch predictor d...
متن کاملTAGE-SC-L Branch Predictors∗
The TAGE predictor [12] is considered as one of the most storage effective global branch/path history predictors. It has been shown that associated with small adjunct predictors like a statistical corrector (SC for short) and/or a loop predictor (L for short) [11, 10], TAGE can even be more effective. In this study, we explore the performance limits of these TAGE-SC-L predictors for respectivel...
متن کاملLatency Tolerant Branch Predictors
The access latency of branch predictors is a well known problem of fetch engine design. Prediction overriding techniques are commonly accepted to overcome this problem. However, prediction overriding requires a complex recovery mechanism to discard the wrong speculative work based on overridden predictions. In this paper, we show that stream and trace predictors, which use long basic prediction...
متن کاملComplex Load-Value Predictors: Why We Need Not Bother
Memory accesses continue to represent a major performance bottleneck and much remains to be done to tolerate their latencies. A large body of work exists that presents load-value prediction as an effective means to hide some of the memory latency. To increase the prediction accuracy and hence the performance, researchers have proposed more complex and larger predictor designs. This paper re-eva...
متن کامل